Uncertain Reward-Transition MDPs for Negotiable Reinforcement Learning
نویسنده
چکیده
Uncertain Reward-Transition MDPs for Negotiable Reinforcement Learning
منابع مشابه
A Geometric Traversal Algorithm for Reward-Uncertain MDPs
Markov decision processes (MDPs) are widely used in modeling decision making problems in stochastic environments. However, precise specification of the reward functions in MDPs is often very difficult. Recent approaches have focused on computing an optimal policy based on the minimax regret criterion for obtaining a robust policy under uncertainty in the reward function. One of the core tasks i...
متن کاملBuilding Relational World Models for Reinforcement Learning
Many reinforcement learning domains are highly relational. While traditional temporal-difference methods can be applied to these domains, they are limited in their capacity to exploit the relational nature of the domain. Our algorithm, AMBIL, constructs relational world models in the form of relational Markov decision processes (MDPs). AMBIL works backwards from collections of high-reward state...
متن کاملExploring compact reinforcement-learning representations with linear regression
This paper presents a new algorithm for online linear regression whose efficiency guarantees satisfy the requirements of the KWIK (Knows What It Knows) framework. The algorithm improves on the computational and storage complexity bounds of the current state-of-the-art procedure in this setting. We explore several applications of this algorithm for learning compact reinforcement-learning represe...
متن کاملApproximate Policy Iteration for Markov Control Revisited
Q-Learning is based on value iteration and remains the most popular choice for solving Markov Decision Problems (MDPs) via reinforcement learning (RL), where the goal is to bypass the transition probabilities of the MDP. Approximate policy iteration (API) is another RL technique, not as widely used as Q-Learning, based on modified policy iteration. In this paper, we present and analyze an API a...
متن کاملSolving Markov Decision Processes via Simulation
This chapter presents an overview of simulation-based techniques useful for solving Markov decision problems/processes (MDPs). MDPs are problems of sequential decision-making in which decisions made in each state collectively affect the trajectory of the states visited by the system — over a time horizon of interest to the analyst. The trajectory in turn, usually, affects the performance of the...
متن کامل